New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of extract operations #172
Comments
Steps to produce that database:
|
Here's the loop that's taking the time: sqlite-utils/sqlite_utils/db.py Lines 892 to 897 in 1ebffe1
|
The sqlite-utils/sqlite_utils/db.py Lines 1244 to 1264 in 1ebffe1
Batching those updates may have an effect. Or finding a way to skip the |
I ran
|
I wonder if I could make this faster by separating it out into a few steps:
|
Problem with this approach is it's not compatible with progress bars - but if it's a multiple of times faster it's worth it. |
Also what would happen if the table had new rows added to it while that command was running? |
There's something to be said for making this operation pausable and resumable, especially if I'm going to make it available in a Datasette plugin at some point. |
Refs #172 - seems to give me about 20% speedup.
My prototype of this knocked the time down from 10 minutes to 4 seconds, so I think the change is worth it!
|
Takes my test down from ten minutes to four seconds.
This command took about 12 minutes (against a 150MB file with 680,000 rows in it):
I'm pretty confident we can do better than that.
The text was updated successfully, but these errors were encountered: